Comparing classical and embodied multimodal fusion for human-robot interaction
نویسنده
چکیده
A robot that interacts with a human has to be able to interpret information from various input channels: it needs to understand and analyse the utterances by the human, it has to keep track of its own environment using sensors, and it needs to incorporate background knowledge about the task it was built for. Typically, a human-robot interaction system has various specialised system components that implement these abilities. Thus, the robot also needs to merge the information from its input channels so that it is able to complete its assigned task. This integration of information from input channels is called multimodal fusion. This thesis presents two approaches for multimodal fusion for a robot that jointly cooperates with a human partner. The first approach, which is called classical multimodal fusion, focusses on processing human utterances. For that, the robot processes speech and gestures of its human partner using methods from classical artificial intelligence to yield logical representations of the utterances. Following that, these representations are enhanced with further information from other input modalities of the robot. In contrast to that, in the second approach the robot generates representations for its own actions in relation to objects in its environment, so-called embodied multimodal fusion. Here, the system uses the data from its input channels to evaluate the relevance of its own actions for a given context. After a literature review, this thesis discusses the theoretical basis of both multimodal fusion approaches and presents how these methods can be implemented on a robot that is able to work together with a human on a common construction task, for which it processes multimodal input. These implementations were used in three human-robot interaction studies, in which näıve subjects worked together with the robot. The experiments were executed to study different aspects of joint action between human and robot. The results of the experiments reveal several interesting facts: the first experiment studies how the robot can explain building plans to the human. The results of the study show that the users preferred a plan explanation strategy in which the robot first names the target object and after that explains the single building steps. The first as well as the second experiment study the generation of referring expression in two different contexts. The results of the studies suggest that experiment participants rate the robot as a better dialogue partner when the robot makes full use of context information to generate referring expressions. Finally, the third experiment studies how humans perceive different roles of the robot in the interaction. The study shows that the users equally accept the robot as an instructor or as an equal partner and simply adjust their own behaviour to the robot’s role.
منابع مشابه
Combining Classical and Embodied Multimodal Fusion for Human-Robot Interaction
Classical artificial intelligence (CAI) and embodied cognition (EC) were successfully applied in different areas: CAI has its strength in fields such as planning or high-level cognition, which require a precise computation that involves logical inference; EC produced excellent approaches for sensori-motor coupling, which requires a robust and flexible computation. The question that we are follo...
متن کاملUsing Embodied Multimodal Fusion to Perform Supportive and Instructive Robot Roles in Human-Robot Interaction
We present a robot that is working with humans on a common construction task. In this kind of interaction, it is important that the robot can perform different roles in order to realise an efficient collaboration. For this, we introduce embodied multimodal fusion, a new approach for processing data from the robot’s input modalities. Using this method, we implemented two different robot roles: t...
متن کاملAchieving Multimodal Cohesion during Intercultural Conversations
How do English as a lingua franca (ELF) speakers achieve multimodal cohesion on the basis of their specific interests and cultural backgrounds? From a dialogic and collaborative view of communication, this study focuses on how verbal and nonverbal modes cohere together during intercultural conversations. The data include approximately 160-minute transcribed video recordings of ELF interactions ...
متن کاملMultimodal Feedback from Robots and Agents in a Storytelling Experiment
In this project, which lies at the intersection between Human-Robot Interaction (HRI) and Human-Computer Interaction (HCI), we have examined the design of an open-source, real-time software platform for controlling the feedback provided by an AIBO robot and/or by the GRETA Embodied Conversational Agent, when listening to a story told by a human narrator. Based on ground truth data obtained from...
متن کاملTowards Formal Multimodal Analysis of Emotions for Affective Computing
Social robotics is related to the robotic systems and human interaction. Social robots have applications in elderly care, health care, home care, customer service and reception in industrial settings. Human-Robot Interaction (HRI) requires better understanding of human emotion. There are few multimodal fusion systems that integrate limited amount of facial expression, speech and gesture analysi...
متن کامل